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^ ' Abstract 

, We show how to exploit symmetries of a graph to efficiently compute the fastest mixing 

Markov chain on the graph (i.e., find the transition probabilities on the edges to minimize the 
\^ | second-largest eigenvalue modulus of the transition probability matrix). Exploiting symmetry 

can lead to significant reduction in both the number of variables and the size of matrices in the 
corresponding semidefinite program, thus enable numerical solution of large-scale instances that 
are otherwise computationally infeasible. We obtain analytic or semi-analytic results for partic- 
ular classes of graphs, such as edge-transitive and distance-transitive graphs. We describe two 
general approaches for symmetry exploitation, based on orbit theory and block-diagonalization, 
respectively. We also establish the connection between these two approaches. 
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^ ; 1 Introduction 

In the fastest mixing Markov chain problem, we choose the transition probabilities on the edges 
Q\ ' of a graph to minimize the second-largest eigenvalue modulus of the transition probability matrix. 

In |BDX04| we formulated this problem as a convex optimization problem, in particular as a 
semidefinite program. Thus it can be solved, up to any given precision, in polynomial time by 
interior-point methods. In this paper, we show how to exploit symmetries of a graph to make the 
computation more efficient. 

X 

H . 

1.1 The fastest mixing Markov chain problem 

We consider an undirected graph Q = (V,£) with vertex set V = {1, . . . ,n} and edge set £ and 
assume that Q is connected. We define a discrete-time Markov chain on the vertices as follows. 

The state at time t will be denoted X(t) G V, for t = 0, 1, Each edge in the graph is associated 

with a transition probability with which X makes a transition between the two adjacent vertices. 
This Markov chain can be described via its transition probability matrix P 6 R nxn , where 

P id = Prob ( X(t + 1) = j | X(t) = i), i, j = 1, ... ,n. 
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Note that Pa is the probability that X(t) stays at vertex i, and Pij = for {i,j} £ (transitions 
are allowed only between vertices that are linked by an edge). We assume that the transition 
probabilities are symmetric, i.e., P = P T ', where the superscript T denotes the transpose of a 
matrix. Of course this transition probability matrix must also be stochastic: 

P > 0, PI = 1, 

where the inequality P > means elementwise, and 1 denotes the vector of all ones. 

Since P is symmetric and stochastic, the uniform distribution (l/n)l T is stationary. In addition, 
the eigenvalues of P are real, and no more than one in modulus. We denote them in non-increasing 
order 

l = A 1 (P)>A 2 (P)>--->A n (P)>-l. 
We denote by n(P) the second-largest eigenvalue modulus (SLEM) of P, i.e., 

fi(P)= max |A;(P)| = max{A 2 (P), -A n (P)}. 

i=2,...,n 

This quantity is widely used to bound the asymptotic convergence rate of the distribution of the 
Markov chain to its stationary distribution, in the total variation distance or chi-squared distance 
(e.g., [DS91, DSC93]). In general the smaller n(P) is, the faster the Markov chain converges. For 
more background on Markov chains, eigenvalues and rapid mixing, see, e.g., the text |Bre99j . 

In [BDX04] . we addressed the following problem: What choice of P minimizes /i(P)? In other 
words, what is the fastest mixing (symmetric) Markov chain on the graph? This can be posed as 
the following optimization problem: 

minimize n(P) 

subject to P > 0, PI = 1, P = P T (1) 
P 3 =0, {i,j}^£. 

Here P is the optimization variable, and the graph is the problem data. We call this problem the 
fastest mixing Markov chain (FMMC) problem. This is a convex optimization problem, in particu- 
lar, the objective function can be explicitly written in a convex form [i(P) = \\P— (l/n)ll T ||2, where 
|| • || 2 denotes the spectral norm of a matrix. Moreover, this problem can be readily transformed 
into a semidefinite program (SDP): 

minimize s 

subject to -si <P- (l/n)ll T < si , . 

P>0, Pl = l, P = P T [) 
Pij=0, {i,j}<££. 

Here / denote the identity matrix, and the variables are the matrix P and the scalar s. The symbol 
^ denotes matrix inequality, i.e., X ^ Y means Y — X is positive semidefinite. 

There has been some follow-up work on this problem. Boyd, Diaconis, Sun, Xiao ([BDSX06J) 
proved analytically that on an n-path the fastest mixing chain can be obtained by assigning the same 
transition probability half at the n — 1 edges and two loops at the two ends. Roch ( |Roc05j ) used 
standard mixing-time analysis techniques (variational characterizations, conductance, canonical 
paths) to bound the fastest mixing time. Gade and Overton ( |GO06j ) have considered the fastest 
mixing problem for a nonreversible Markov chain. Here, the problem is non-convex and much 
remains to be done. Finally, closed form solutions of fastest mixing problems have recently been 
applid in statistics to give a generalization of the usual spectral analysis of time series for more 
general discrete data, see (Sal06]. 
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1.2 Exploiting problem structure 

The SDP formulation (|2|) means that the FMMC problem can be efficiently solved using standard 
SDP solvers, at least for small or medium size problems (with number of edges up to a thousand 
or so). General background on convex optimization and SDP can be found in, e.g., [NN941 [VB96, 
IWSVOOl IBTNOll IBV04j . The current SDP solvers (e.g., |Stu99l ITTT991 IYFK03Q mostly use 
interior-point methods which have polynomial time worst-case complexity. 

When solving the SDP ([2]) by interior-point methods, in each iteration we need to compute 
the first and second derivatives of the logarithmic barrier functions (or potential functions) for 
the matrix inequalities, and assemble and solve a linear system of equations (the Newton system). 
Let n be the number of vertices and m be the number of edges in the graph (equivalently m is the 
number of variables in the problem). The Newton system is a set of m linear equations with m 
unknowns. Without exploiting any structure, the number of flops per iteration in a typical barrier 
method is on the order maxjmn 3 , m 2 n 2 , m 3 }, where the first two terms come from computing and 
assembling the Newton system, and the third term amounts to solving it (see, e.g., [BV04, §11.8.3]). 
(Other variants of interior-point methods have similar orders of flop count.) 

Exploiting problem structure can lead to significant improvement of solution efficiency. As for 
many other problems defined on a graph, sparsity is the most obvious structure to consider here. In 
fact, many current SDP solvers already exploit sparsity. However, as a well-known fact, exploiting 
sparsity alone in interior-point methods for SDP has limited effectiveness. The sparsity of P, and 
the sparsity plus rank-one structure of P — (l/n)ll T , can be exploited to significantly reduce the 
complexity of assembling the Newton system, but typically the Newton system itself is dense. The 
computational cost per iteration can be reduced to order 0(m 3 ), dominated by solving the dense 
linear system (see analysis for similar problems in, e.g., I5YX00 XI>() 1. XBK07 ). 

In addition to using interior-point methods for the SDP formulation ([2]), we can also solve the 
FMMC problem in the form ([I]) by subgradient-type (first-order) methods. The subgradients of 
fj-(P) can be obtained by computing the extreme eigenvalues and associated eigenvectors of the 
matrix P. This can be done very efficiently by iterative methods, specifically the Lanczos method, 
for large sparse symmetric matrices (e.g., [GL96, Saa92j). Compared with interior-point methods, 
subgradient-type methods can solve much larger problems but only to a moderate accuracy (they 
don't have polynomial-time worst-case complexity). In [BDX04], we used a simple subgradient 
method to solve the FMMC problem on graphs with up to a few hundred thousand edges. More 
sophisticated first-order methods for solving large-scale eigenvalue optimization problems and SDPs 
have been reported in, e.g., [HR001 IBM03j INem04l ILNM041 INes05j . A successive partial linear 
programming method was developed in }Ove 92j. 

In this paper, we focus on the FMMC problem on graphs with large symmetry groups, and 
show how to exploit symmetries of the graph to make the computation more efficient. A result by 
Erdos and Renyi [ER63] states that with probability one, the symmetry group of a (suitably defined) 
random graph is trivial, i.e., it contains only the identity element. Nevertheless, many of the graphs 
of theoretical and practical interest, particularly in engineering applications have very interesting, 
and sometimes very large, symmetry groups. Symmetry reduction techniques have been explored 
in several different contexts, e.g., dynamical systems and bifurcation theory [GSS88], polynomial 
system solving [GatOOl IWor94| . numerical solution of partial differential equations [FS92], and Lie 
symmetry analysis in geometric mechanics [MR99:. In the context of optimization, a class of SDPs 
with symmetry has been defined in [KOMKOlj, where the authors study the invariance properties 
of the search directions of primal-dual interior-point methods. In addition, symmetry has been 
exploited to prune the enumeration tree in branch-and-cut algorithms for integer programming 
[Mar03] . and to reduce matrix size in a spectral radius optimization problem [HOY03J. 
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Closely related to our approach in this paper, the recent work [dKPS07j considers general SDPs 
that are invariant under the action of a permutation group, and developed a technique based 
on matrix *-representation to reduce problem size. This technique has been applied to simplify 
computations in SDP relaxations for graph coloring and maximal clique problems [DR07| . and to 
strengthen SDP bounds for some coding problems [Lau07| . 

For the FMMC problem, we show that exploiting symmetry allows significant reduction in both 
number of optimization variables and size of matrices. Effectively, they correspond to reducing m 
and n, respectively, in the flop counts for interior-point methods mentioned above. The problem 
can be considerably simplified and is often solvable analytically by only exploiting symmetry. We 
present two general approaches for symmetry exploitation, based on orbit theory [BDPX05] and 
block-diagonalization [GP04], respectively. We also establish the connection between these two 
approaches. 

1.3 Outline 

In 521 we explain the concepts of graph automorphisms and the automorphism group (symmetry 
group) of a graph. We show that the FMMC problem always attains its optimum in the fixed-point 
subset of the feasible set under the automorphism group. This allows us to only consider a number 
of distinct transition probabilities that equals the number of orbits of the edges. We then give 
a formulation of the FMMC problem with reduced number of variables (transition probabilities), 
which appears to be very convenient in subsequent sections. 

In 531 we give closed-form solutions for the FMMC problem on some special classes of graphs, 
namely edge-transitive graphs and distance-transitive graphs. Along the way we also discuss FMMC 
on graphs formed by taking Cartesian products of simple graphs. 

In 5H we first review the orbit theory for reversible Markov chains, and give sufficient conditions 
on constructing an orbit chain that contain all distinct eigenvalues of the original chain. This orbit 
chain is usually no longer symmetric but always reversible. We then solve the fastest reversible 
Markov chain problem on the orbit graph, from which we immediately obtain optimal solution to 
the original FMMC problem. 

In 521 we review some group representation theory and show how to block diagonalize the linear 
matrix inequalities in the FMMC problem by constructing a symmetry-adapted basis. The resulting 
blocks usually have much smaller sizes and repeated blocked can be discarded in computation. 
Extensive examples in §4] and 53 reveal interesting connections between these two general symmetry 
reduction methods. 

In 5S1 we conclude the paper by pointing out some possible future work. 

2 Symmetry analysis 

In this section we explain the basic concepts that are essential in exploiting graph symmetry, and 
derive our result on reducing the number of optimization variables in the FMMC problem. 

2.1 Graph automorphisms and classes 

The study of graphs that possess particular kinds of symmetry properties has a long history. The 
basic object of study is the automorphism group of a graph, and different classes can be defined 
depending on the specific form in which the group acts on the vertices and edges. 

An automorphism of a graph Q = (V,£) is a permutation a of V such that {i,j} £ £ if and 
only if {<r(i), cr(j)} G £■ The (full) automorphism group of the graph, denoted by Aut(C/), is the set 
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Figure 1: The graph on the left side is edge-transitive, but not vertex-transitive. The one on the 
right side is vertex-transitive, but not edge-transitive. 



of all such permutations, with the group operation being composition. For instance, for the graph 
on the left in Figure [H the corresponding automorphism group is generated by all permutations of 
the vertices {1,2,3}. This group, isomorphic to the symmetric group S3, has six elements, namely 
the permutations 123 -> 123 (the identity), 123 -> 213, 123 -» 132, 123 -> 321, 123 -> 231, and 
123 — > 312. Note that vertex 4 cannot be permuted with any other vertex. 

Recall that an action of a group G on a set X is a homomorphism from G to the set of all 
permutations of the elements in X {i.e., the symmetric group of degree \X\). For an element 
x £ X, the set of all images g(x), as g varies through G, is called the orbit of x. Distinct orbits 
form equivalent classes and they partition the set X. The action is transitive if for every pair of 
elements x,y £ X, there is a group element g £ G such that g(x) = y. In other words, the action 
is transitive if there is only one single orbit in X. 

A graph Q = (V,£) is said to be vertex-transitive if Aut(£7) acts transitively on V. The action 
of a permutation a on V induces an action on £ with the rule a({i,j}) = {a(i),a(j)}. A graph 
Q is edge-transitive if Aut(Q) acts transitively on £. Graphs can be edge-transitive without being 
vertex-transitive and vice versa; simple examples are shown in Figured) 

A graph is 1- arc-transitive if given any four vertices u,v,x,y with {u,v},{x,y} £ £, there 
exists an automorphism g £ Aut(G) such that g(u) = x and g(v) = y. Notice that, as opposed to 
edge-transitivity, here the ordering of the vertices is important, even for undirected graphs. In fact, 
a 1-arc transitive graph must be both vertex-transitive and edge-transitive, and the reverse may 



not be true. The 1-arc-transitive graphs are called symmetric graphs in Big74 , but the modern 



use extends this term to all graphs that are simultaneously edge- and vertex-transitive. Finally, 
let S(u,v) denote the distance between two vertices u,v £V. A graph is called distance-transitive 
if, for any four vertices u,v,x,y with 5(u,v) = S(x,y), there is an automorphism g £ Aut(G) such 
that g(u) = x and g(v) = y. 

The containment relationship among the four classes of graphs described above is illustrated 
in Figure [2j Explicit counterexamples are known for each of the non-inclusions. It is generally 
believed that distance-transitive graphs have been completely classified. This work has been done 
by classifying the distance-regular graphs. It would take us too far afield to give a complete 
discussion. See the survey in [DSC061 Section 7]. 

The concept of graph automorphism can be naturally extended to weighted graphs, by requiring 
that the permutation must also preserve the weights on the edges (e.g., [BDPX05]). This extension 
allows us to exploit symmetry in more general reversible Markov chains, where the transition 
probability matrix is not necessarily symmetric. 
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Distance-transitive 



1-arc transitive 



Edge-transit ive 



Vertex-transitive 



Figure 2: Classes of symmetric graphs, and their inclusion relationship. 
2.2 FMMC with symmetry constraints 

A permutation a E Aut(C7) can be represented by a permutation matrix Q, where Qij = 1 if 
i = a{j) and = otherwise. The permutation a induces an action on the transition probability 
matrix by a(P) = QPQ T . We denote the feasible set of the FMMC problem (JTJ) by C, i.e., 



C = {P G R nxn | P > 0, PI = 1, P 



pT p. . 



Ofor {i,j}$£}. 



This set is invariant under the action of graph automorphism. To see this, let h = a{i) and k = cr(j). 
Then we have 

(a(P)) hk = (QPQ T )hk = Y,^ P )hlQkl = (QP)hj =Y^QhlPlj = Piy 



Since a is a graph automorphism, we have {h, k} E £ if and only if E £, so the sparsity 

pattern of the probability transition matrix is preserved. It is straightforward to verify that the 
conditions P > 0, PI = 1, and P = P T , are also preserved under this action. 
Let T denote the fixed-point subset of C under the action of Aut(£?); i.e., 

T = {P e C | a(P) = P, a E Aut(£)}. 

We have the following theorem (see also [GP04, Theorem 3.3]). 

Theorem 2.1. The FMMC problem always has an optimal solution in the fixed-point subset T . 

Proof. Let denote the optimal value of the FMMC problem (p}, i.e., /i* = inf{^(P)|P E C}. 
Since the objective function [i is continuous and the feasible set C is compact, there is at least one 
optimal transition matrix P* such that n(P*) = ■ Let P denote the average over the orbit of P* 
under Aut(^) 

o-eAut(g) 

This matrix is feasible because each o~(P*) is feasible and the feasible set is convex. By construction, 
it is also invariant under the actions of Aut(£?). Moreover, using the convexity of [i, we have 
H(P) < /u(P*). It follows that P E T and fj,(P) = /i*. □ 

As a result of Theorem 12.14 we can replace the constraint P E C by P E J- in the FMMC 
problem and get the same optimal value. In the fixed-point subset T , the transition probabilities 
on the edges within an orbit must be the same. So we have the following corollaries: 
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Corollary 2.2. The number of distinct edge transition probabilities we need to consider in the 
FMMC problem is at most equal to the number of orbits of £■ under Aut(Q). 

Corollary 2.3. If Q is edge-transitive, then all the edge transition probabilities can be assigned the 
same value. 

Note that the holding probability at the vertices can always be eliminated using Pa = 1 — ^ - P^. 
So it suffices to only consider the edge transition probabilities. 

2.3 Formulation with reduced number of variables 

From the results of the previous section, we can reduce the number of optimization variables in the 
FMMC problem from the number of edges to the number of edge orbits under the automorphism 
group. Here we give an explicit parametrization of the FMMC problem with the reduced number 
of variables. This parametrization is also the precise characterization of the fixed-point subset T . 

Recall that the adjacency matrix of a graph with n vertices isanxn matrix A whose entries are 
given by A{j = 1 if {i,j} £ £ and A^j = otherwise. Let Vi be the valency (degree) of vertex i. The 
Laplacian matrix of the graph is given by L = Diag(z^i, U2, ... , v n ) — A, where Diag(zv) denotes a 
diagonal matrix with the vector v as its diagonal. Extensive account of the Laplacian matrix and 
its use in algebraic graph theory are provided in, e.g., |Mer944 IChu97l iGR Olj . 

Suppose that there are N orbits of edges under the action of Aut(£7). For each orbit, we define 
an orbit graph Q k = (V, £ k ), where E k is the set of edges in the fcth orbit. Note that the orbit graphs 
are disconnected (there are disconnected vertices) if the original graph is not edge-transitive. Let 
L k be the Laplacian matrix of Q k . Note that the diagonal entries (L k )a equals the valency of node i 
in Q k (which is zero if vertex % is disconnected with all other vertices in Q k ). 

By Corollarv l2,21 we can assign the same transition probability on all the edges in the /c-th orbit. 
Denote this transition probability by pt and let p = (pi, . . . ,Pn)- Then the transition probability 
matrix can be written as 

JV 

P(p) = I-J2PkL k . (3) 

k=l 

This parametrization of the transition probability matrix automatically satisfies the constraints 
P = P T , PI = 1, and P{j = for {i,j} € £. The entry-wise nonnegative constraint P > now 
translates into 

p k >0, k = l,...,N 

N 

}XL k )iip k < 1, i = l,...,n 

k=l 

where the first set of constraints are for the off-diagonal entries of P, and the second set of con- 
straints are for the diagonal entries of P. 

It can be verified that the parametrization ([3]), together with the above inequality constraints, 
is the precise characterization of the fixed-point subset T . Therefore we can explicitly write the 
FMMC problem restricted to the fixed-point subset as 

minimize /x {l - Ylk=\ Vk^k) 

subject to p k > 0, k = l,...,N (4) 
J2 k =i( L k)uPk < 1, i=l,...,n. 
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Later in this paper, we will also need the corresponding SDP formulation 

minimize s 

subject to —si ^ I — Ylk=iPkLk 
p k >0, k=l,...,N 

Ylk=l( L k)iiPk < 1, i 

3 Some analytic results 

For some special classes of graphs, the FMMC problem can be considerably simplified and often 
solved by only exploiting symmetry. In this section, we give some analytic results for the FMMC 
problem on edge-transitive graphs, Cartesian product of simple graphs, and distance-transitive 
graphs (a subclass of edge-transitive graphs). The optimal solution is often expressed in terms of 
the eigenvalues of the adjacency matrix or the Laplacian matrix of the graph. It is interesting to 
notice that even for such highly structured class of graphs, neither the maximum-degree nor the 
Metropolis-Hastings heuristics discussed in [ BDX04] give the optimal solution. Throughout, we use 
a* to denote the common edge weight of the fastest mixing chain and fi* to denote the optimal 
SLEM. 



(l/n)ll r < si 
-- 1, ... ,n. 



(5) 



3.1 FMMC on edge-transitive graphs 

Theorem 3.1. Suppose the graph Q is edge-transitive, and let a be the transition probability as- 
signed on all the edges. Then the optimal solution of the FMMC problem is 

a* = mini—!— 2 — 1 (6) 

I ^max Ai(-L) + A n _i(Lj J 

I ^max Ai(L) +A n _i(L) J W 

where f max = maxjgy v \ is the maximum valency of the vertices in the graph, and L is the Laplacian 
matrix defined in §2.31 



Proof. By definition of an edge-transitive graph, there is a single orbit of edges under the actions 
of its automorphism group. Therefore we can assign the same transition probability a on all the 
edges in the graph (Corollary I2.3p . and the parametrization Q becomes P = I — aL. So we have 

Xi(P) = 1 - aX n+ i-i(L), i = l,...,n 

and the SLEM 



M (P) = max{A 2 (P), -Xn(P)} 

= max{l — a\ n -i(L), aX\{L) — l}. 

To minimize n{P), we let 1 — aX n -\{L) = aX\{L) — 1 and get a = 2/(A n _i(L) + A n _i(L)). 
But the nonnegativity constraint P > requires that the transition probability must also satisfy 
< a < l/v max - Combining these two conditions gives the optimal solution (|6j) and ([7|). D 

We give two examples of FMMC on edge-transitive graphs. 
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Figure 3: The cycle graph C n with n = 9. 

3.1.1 Cycles 

The first example is the cycle graph C n ; see Figure [3l The Laplacian matrix is 



L 



which has eigenvalues 



The two extreme eigenvalues are 



2 -1 
-1 2 
-1 


-10 

2kn 
2 cos , 




-1 
2 









-1 








k = 1, . . . , n. 



Ai(L) 



2 cos 



2[n/2\-n 



2tt 



n 



A n _i(L) = 2 — 2 cos — 



n 



where \n/2\ denotes the largest integer that is no larger than n/2, which is n/2 for n even or 
(n — l)/2 for n odd. By Theorem 13,11 the optimal solution to the FMMC problem is 



a. 



2tt 2\n/2\w 

2 — cos 4r — cos L 1 J 



2tt 2ln/2l7r 
cos — - COS 

n 2?r 2|n/2|7r ' 

2 — cos — cos L i J 



(8) 
(9) 



When n — ► oo, the transition probability a* — > 1/2 and the SLEM fi* — > 1 — 2it 2 /n 2 . 



3.1.2 Complete bipartite graphs 

The complete bipartite graph, denoted K mjn , has two subsets of vertices with cardinalities m and n 
respectively. Each vertex in a subset is connected to all the vertices in the other subset, and is not 
connected to any of the vertices in its own subset; see Figure HI Without loss of generality, assume 
m < n. So the maximum degree is ^ max = n - This graph is edge-transitive but not vertex-transitive. 
The Laplacian matrix of this graph is 



L 



nJ, 
-1 



it Km TTlI n 
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Figure 4: The complete bipartite graph K mn with m = 3 and n = 4. 

where I m denotes the m by m identity matrix, and l mX n denotes the m by n matrix whose 
components are all ones. For n > m > 2, this matrix has four distinct eigenvalues m + n, n, m and 
0, with multiplicities 1, m— 1, n — 1 and 1, respectively (see §5.3. If) . By Theorem 13 .1\ the optimal 
transition probability on the edges and the corresponding SLEM are 

a* = mini-, — - — 1 (10) 
[ n n + 2m J 

(n-m n } 



3.2 Cartesian product of graphs 

Many graphs we consider can be constructed by taking Cartesian product of simpler graphs. The 
Cartesian product of two graphs Q\ = (Vi,£i) and Q2 = ^,£2) is a graph with vertex set Vi x V2, 
where two vertices (1*1,1(2) and («i,«2) are connected by an edge if and only if ui = ui and 
{112,^2} S ^2i or U2 = V2 and {iii,wi} G £j. Let ^1 © Q2 denote this Cartesian product. Its 
Laplacian matrix is given by 

L Qi®Q2 = L Gi ® Vil + ^|V 2 | ® ^2 ( 12 ) 
where (8> denotes the matrix Kronecker product ([Gra81j). The eigenvalues of Lg 1 ^g 2 are given by 

MLcJ + AjO&a,), » = i,...,|Vi|, j = i,...,|v 2 | (13) 

where each eigenvalue is obtained as many times as its multiplicity (e.g., [Moh97]). The adjacency 
matrix of the Cartesian product of graphs also has similar properties, which we will use later for 
distance-transitive graphs. Detailed background on spectral graph theory can be found in, e.g., 
Blg74l IDCS801 IChu971 [GRlIT] . 



Combining Theorem 13.11 and the above expression for eigenvalues, we can easily obtain solutions 
to the FMMC problem on graphs formed by taking Cartesian product of simple graphs. 

3.2.1 Two-dimensional meshes 

Here we consider the two-dimensional mesh with wraparounds at two ends of each row and column, 
see Figure [5l It is the Cartesian product of two copies of C n . We write it as M n = C n ®C n . By 
equation fjl3|> . its Laplacian matrix has eigenvalues 

2i7T 2j7T 

4 — 2 cos 2 cos , 1, j = 1, . . . , n. 

n n 
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'> f> -P* ^ 

^ | I I I -V 



Figure 5: The two-dimensional mesh with wraparounds M n with n = 4. 

By Theorem 13. fl we obtain the optimal transition probability 

1 



and the smallest SLEM 



a 



/i 



o o 2|n/2|7r 27T 

3 — 2 cos L / J cos — 

n n 



-. 2ln/2l7r . 2vr 
1 - 2 COS L „ J + COS — 



3 — 2 cos 



2|n/2]7r 



COS 



2tt 



When n — > oo, the transition probability a* — > 1/4 and the SLEM fi* — > 1 — tt /n . 
3.2.2 Hypercubes 

The d-dimensional hypercube, denoted Qd, has 2 d vertices, each labeled with a binary word with 
length d. Two vertices are connected by an edge if their words differ in exactly one component 
(see Figured]). This graph is isomorphic to the Cartesian product of d copies of K2, the complete 
graph with two vertices. The Laplacian of K2 is 



Lk 2 



1 -1 
-1 1 



whose two eigenvalues are and 2. The one-dimensional hypercube Q\ is just K2. Higher dimen- 
sional hypercubes are defined recursively: 



Qk+i = Qk © K 2 , 
By equation (|12p . their Laplacian matrices are 



k = 1,2,.... 



L Q k +i = L Qk ®h + hk ® Lk 2 , 



k = 1,2 



Using equation ([13]) recursively, the Laplacian of Qd has eigenvalues 2k, k = 0, 1, . . . , d, each with 
multiplicity (£). The FMMC is achieved for: 



d-1 



d+V r d+1 
This solution has also been worked out, for example, in [Moh97j. 
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Figure 6: The hypercubes Qi, Q2 and Qz 



3.3 FMMC on distance-transitive graphs 



Distance-transitive graphs have been studied extensively in the literature (see, e.g., [BCN 89J). 
In particular, they are both edge- and vertex-transitive. In previous examples, the cycles and the 
hypercubes are actually distance-transitive graphs; so are the bipartite graphs when the two parties 
have equal number of vertices. 

In a distance-transitive graph, all vertices have the same valency, which we denote by v. The 
Laplacian matrix can be written as L = vl — A, with A being the adjacency matrix. Therefore 



Ai(L) 



A 



n+l- 



1. 



■ n. 



We can substitute the above equation in equations ([6]) and ([7]) to obtain the optimal solution in 
terms of X 2 (A) and X n (A). 

Since distance-transitive graphs usually have very large automorphism groups, the eigenvalues 
of the adjacency matrix A (and the Laplacian L) often have very high multiplicities. But to solve 
the FMMC problem, we only need to know the distinct eigenvalues; actually, only \2(A) and X n (A) 
would suffice. In this regard, it is more convenient to use a much smaller matrix, the intersection 
matrix, which has all the distinct eigenvalues of the adjacency matrix. 

Let D be the diameter of the graph. For a nonnegative integer k < D, choose any two vertices u 
and v such that their distance satisfies 5(u,v) = k. Let a^, b k and be the number of vertices 
that are adjacent to u and whose distance from v are k, k + 1 and k — 1, respectively. That is, 



a k 


= \{w 


G V 


5(u, 


w) 


= 1, 


5(w, v) 


= k}\ 




= l{t» 


G V 


S(u, 


w) 


= 1, 


5(w, v) 


= k + l}\ 


Ck 


= \{v> 


G V 


5(u, 


w) 


= 1, 


5(w, v) 


= k-l}\ 



For distance-transitive graphs, these numbers are independent of the particular pair of vertices u 
and v chosen. Clearly, we have ao = 0, 60 = v an d c\ = 1. The intersection matrix B is the 
following tridiagonal {D + 1) x (D + 1) matrix 



D 



ao 



bo 
a 1 

c-2 



a-2 



CD 



bD-i 

CLD 



Denote the eigenvalues of the intersection matrix, in decreasing order, as 770, rji, 
These are precisely the (D + 1) distinct eigenvalues of the adjacency matrix A (see, e.g. 
In particular, we have 



VP - 



\ 1 (A) = 7] = u, X 2 (A) = r] 1 , X n (A)=rj D . 
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The following corollary is a direct consequence of Theorem 13.11 

Corollary 3.2. The optimal solution of the FMMC problem on a distance-transitive graph is 

' 1 2 



a 



mm 



max 



2/ 2v- (rjx +r] D ) 



(14) 
(15) 



v 2v - (771 + 7] D ) 

Next we give solutions for the FMMC problem on several families of distance-transitive graphs. 
3.3.1 Complete graphs 

The case of the complete graph with n vertices, usually called K n , is very simple. It is distance- 
transitive, with diameter D = 1 and valency v = n — 1. The intersection matrix is 



B 



n- 1 

1 n-2 



with eigenvalues tjq = n — 1, 771 = — 1. Using equations (fT4"l) and (fl~5j) . the optimal parameters are 

// = 0. 



1 

n' 



The associated matrix P = (l/n)ll T has one eigenvalue equal to 1, and the remaining n — 1 
eigenvalues vanish. Such Markov chains achieve perfect mixing after just one step, regardless of 
the value of n. 

3.3.2 Petersen graph 

The Petersen graph, shown in Figure [71 is a well-known distance-transitive graph with 10 vertices 
and 15 edges. The diameter of the graph is D = 2, and the intersection matrix is 



B 



3 

1 2 
12 



with eigenvalues 770 = 3, 771 = 1 and 772 = —2. Applying the formula (|14p and (|15p . we obtain 



a 



3.3.3 Hamming graphs 

The Hamming graphs, denoted H(d, n), have vertices labeled by elements in the Cartesian product 
{l,...,n} d , with two vertices being adjacent if they differ in exactly one component. By the 
definition, it is clear that Hamming graphs are isomorphic to the Cartesian product of d copies 
of the complete graph K n . Hamming graphs are distance-transitive, with diameter D = d and 
valency v = d (n — 1). Their eigenvalues are given by r/j, = d (n — 1) — kn for k = 0, . . . , d. These 
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Figure 7: The Petersen graph. 



can be obtained using an equation for eigenvalues of adjacency matrices, similar to (fT3|) . with the 
eigenvalues of K n being n — 1 and —1. Therefore the FMMC has parameters: 



We note that hypercubes (see £ j3.2.2p are special Hamming graphs with n = 2. 
3.3.4 Johnson graphs 

The Johnson graph J(n,q) (for 1 < q < n/2) is defined as follows: the vertices are the ^-element 
subsets of {1, . . . ,n}, with two vertices being connected with an edge if and only if the subsets 
differ exactly by one element. It is a distance-transitive graph, with (™) vertices and ^q (n — <?)(") 
edges. It has valency v = q(n — q) and diameter D = q. The eigenvalues of the intersection matrix 
can be computed analytically and they are: 

rjk = q(n-q) + k(k-n- 1), k = 0,...,q. 

Therefore, by Corollary 13.21 we obtain the optimal transition probability 



For graphs with large automorphism groups, the eigenvalues of the transition probability matrix 
often have very high multiplicities. To solve the FMMC problem, it suffices to work with only 
the distinct eigenvalues without consideration of their multiplicities. This is exactly what the 
intersection matrix does for distance-transitive graphs. In this section we develop similar tools for 
more general graphs. More specifically, we show how to construct an orbit chain which is much 
smaller in size than the original Markov chain, but contains all its distinct eigenvalues (with much 
fewer multiplicities). The FMMC on the original graph can be found by solving a much smaller 
problem on the orbit chain. 






and the smallest SLEM 




4 FMMC on orbit graphs 
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4.1 Orbit theory 



Here we review the orbit theory developed in [BDPX05] . Let P be a symmetric Markov chain on 
the graph Q = (V, £), and H be a group of automorphisms of the graph. Often, it is a subgroup of 
the full automorphism group Aut(£7). The vertex set V partitions into orbits O v = {hv : h £ H}. 
For notational convenience, in this section we use P(v,u), for v,u £ V, to denote entries of the 
transition probability matrix. We define the orbit chain by specifying the transition probabilities 
between orbits 

P H (O v ,O u )=P(v,O u )= P(v,u')- (16) 

«'eo„ 

This transition probability is independent of which v £ 0(v ) is chosen, so it is well defined and the 
lumped orbit chain is indeed Markov. 

The orbit chain is in general no longer symmetric, but it is always reversible. Let vr(z), i 6 V, 
be the stationary distribution of the original Markov chain. Then the stationary distribution on 
the orbit chain is obtained as 

<KH(O v ) = £ TT(i). (17) 

i£O v 

It can be verified that 

7th(O v )P h (0 Vi O u ) = tt h (O u )Ph(O u , O v ), (18) 

which is the detailed balance condition to test reversibility. 

The following is a summary of the orbit theory we developed in [BDPX05], which relate the 
eigenvalues and eigenvectors of the orbit chain Ph to the eigenvalues and eigenvectors of the original 
chain P. 

• Lifting ([BDPX05, §3.1]). If A is an eigenvalue of Pjj with associated eigenvector /, then 
A is an eigenvalue of P with //-invariant eigenfunction f{v) = f(O v ). Conversely, every 
//-invariant eigenfunction appears uniquely from this construction. 

• Projection QBDPX05, §3.2]). Let A be an eigenvalue of P with eigenvector /. Define f(O v ) = 
J2heH f(h ( v ))- Then A appears as an eigenvalue of Ph, with eigenvector /, if either of the 
following two conditions holds: 

(a) H has a fixed point v* and f(v*) ^ 0. 

(b) / is nonzero at a vertex v* in an Aut(C?)-orbit which contains a fixed point of PL. 

Equipped with this orbit theory, we would like to construct one or multiple orbit chains that 
retain all the eigenvalues of the original chain. Ideally the orbit chains are much smaller in size than 
the original chain, with the eigenvalues having much fewer multiplicities. The following theorem 
(Theorem 3.7 in [BDPX05J) gives sufficient conditions that guarantee that the orbit chain(s) attain 
all the eigenvalues of the original chain. 

Theorem 4.1. Suppose that V = 0\ U . . . U Ok is a disjoint union of the orbits under Aut(Q). Let 
Hi be the subgroup of Aut{Q) that has a fixed point in 0{. Then all eigenvalues of P occur among 
the eigenvalues of {-P/r^fLi- Further, every eigenvector of P occurs by lifting an eigenvector of 
some P}i i . 

Observe that if H C G C Aut(£?), then the eigenvalues of Ph contain all eigenvalues of Pq. 
This allows disregarding some of the Hi in Theorem 14.11 In particular, it is possible to construct a 
single orbit chain that contains all eigenvalues of the original chain. Therefore we have 
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mp 

(a) Orbit chain under S m x S n . 
(n—l)p 





(b) Orbit chain under S m -i x S n . 
(n—l)p 




(c) Orbit chain under S m x S n -i- 

Figure 8: Orbit chains of K mn under different automorphism groups. The vertices labeled O u and 
O v are orbits of vertices u and v (labeled in Figure HJ) under corresponding actions. The vertices 
labeled x and y are fixed points. 



Corollary 4.2. Suppose that V = 0\ U . . . U Ok is a disjoint union of the orbits under Aut{Q), and 
H is a subgroup of Aut{Q). If H has a fixed point in every Oi, then all distinct eigenvalues of P 
occur among the eigenvalues of Ph . 

Remarks. To find H in the above corollary, we can just compute the corresponding stabilizer, 
i.e., compute the largest subgroup of Aut(£?) that fixes one point in each orbit. Note that the H 
promised by the corollary may be trivial in some cases; see the example in §5.3.61 

We illustrate the orbit theory with the bipartite graph K mn shown in Figure HI It is easy to 
see that Aut(K mn ) is the direct product of two symmetric groups, namely S m x S n , with each 
symmetric group permuting one of the two subsets of vertices. This graph is edge-transitive. So 
we assign the same transition probability p on all the edges. 

The orbit chains under four different subgroups of Aut(if m>n ) are shown in Figure El The 
transition probabilities between orbits are calculated using equation (|16p . Since the transition 
probabilities are not symmetric, we represent the orbit chains by directed graphs, with different 
transition probabilities labeled on opposite directions between two adjacent vertices. The full 
automorphism group Aut(if m>n ) has two orbits of vertices; see Figure 8(a) The orbit graphs under 
the subgroups S m -i x S n (Figure [8(b)[ ) and S m x SVi-i (Figure 8(c) ) each contains a fixed point of 
the two orbits under Aut(if m>n ). By Theorem 14. 1\ these two orbit chains contain all the distinct 
eigenvalues of the original chain on K m ^ n . Alternatively, we can construct the orbit chain under the 
subgroup S m -i x S n -\, shown in Figure [8(d) [ This orbit chain contain a fixed point in both orbits 
under Kut(K ni , n ). By Corollary 14.14 all distinct eigenvalues of K m ^ n appear in this orbit chain. In 
particular, this shows that there are at most four distinct eigenvalues in the original chain. 
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If we order the vertices in Figure [8(d)| as (x,y,O u ,O v ), then the transition probability matrix 
for this orbit chain is 



— np 


P 







(n 


-l)p 


P 


1 — mp 


(m 


-l)p 










P 


1 - 


- np 


(n 


-l)p 


P 





(m 


-l)p 


1- 


- mp 



where H = S m -i x S n -±. By equation ([T7j) . its stationary distribution is 

/ 1 1 m — 1 n — 1 \ 

^ H = — : — ' — : — ' — : — > — : — • 

\m+n m + n m + n m + nl 



4.2 Fastest mixing reversible Markov chain on orbit graph 

Since in general the orbit chain is no longer symmetric, we cannot directly use the convex optimiza- 
tion formulation (pQ) or ([2]) to minimize fi(Pn)- Fortunately, the detailed balance condition (|18p 
leads to a simple transformation that allow us to formulate the problem of finding the fastest 
reversible Markov chain as a convex program [BDX04J. 

Suppose the orbit chain Pjj contains all distinct eigenvalues of the original chain. Let tth be the 
stationary distribution of the orbits, and let IT = Diag(7r#). The detailed balance condition (|18p 
can be written as UPh = fjyll, which implies that the matrix H 1 / 2 PhII~ 1 I 2 is symmetric (and 
of course, has the same eigenvalues as Ph)- The eigenvector of n 1 / 2 ^^!! -1 / 2 associated with the 
maximum eigenvalue 1 is q = (y7Tff(Oi), • • • , y^niPk))- The SLEM h(Ph) equals the spectral 
norm of n^Pffn -1 / 2 restricted to the orthogonal complement of the subspace spanned by q. This 
can be written as 

KPh) = \\(I- qq T )^ 2 P H ^- 1/2 {I ~ qq T )h = \\^/ 2 P H H~^ - qq T \\ 2 . 

Introducing a scalar variable s to bound the above spectral norm, we can formulate the fastest 
mixing reversible Markov chain problem as an SDP 

minimize s 

subject to -si < n 1 / 2 P // n~ 1 / 2 -qq T <sl 

Ph>0, Ph1 = 1, UP h = P^U !19) 
Ph(O,O') = 0, (0,0')^£ h . 

The optimization variables are the matrix Ph and scalar s, and problem data is given by the orbit 
graph and the stationary distribution tth- Note that the reversibility constraint UPh = Ph^ can 
be dropped since it is always satisfied by the construction of the orbit chain; see equation (fTSj) . 
By pre- and post-multiplying the matrix inequality by II 1//2 , we can write then another equivalent 
formulation: 

minimize s 

subject to —sir ■< UPh — t^h^h — s ^ 

Ph>0, Ph1 = 1, {2i]] 
P H (O,O') = 0, {0,0')<££h- 

To solve the fastest mixing reversible Markov chain problem on the orbit graph, we need the 
following three steps. 
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1. Conduct symmetry analysis on the original graph: identify the automorphism graph Aut(C7) 
and determine the number of orbits of edges N. By Corollary 12.21 this is the number of 
transition probabilities we need to consider. 

2. Find a group of automorphisms H that satisfies the conditions in Corollary 14.21 Construct 
its orbit chain by computing the transition probabilities using equation (fTBT) . and compute 
the stationary distribution using equation (fTT|h Note that the entries of Pjj are multiples of 
the transition probabilities on the original graph. 

3. Solve the fastest mixing reversible Markov chain problem (|19p . The optimal SLEM h(Pjj) is 
also the optimal SLEM for the original chain, and the optimal transition probabilities on the 
original chain can be obtained by simple scaling of the optimal orbit transition probabilities. 

We have assumed a single orbit chain that contains all distinct eigenvalues of the original chain. 
Sometimes it is more convenient to use multiple orbit chains. Let Ph^ i = ^,---,K, be the 
collection of orbit chains in Theorem 14. 11 In this case we need to minimize maxjyu(Pjj i ). This can 
be done by simply adding the set of constraints in (|19[) for every matrix Pu i . For example, for the 
complete bipartite graph K m . a , instead of using the single orbit chain in Figure |8(d)| we can use 
the two orbit chains in Figure |8(b)| and Figure 8(c) together, with two sets of constraints in the 



SDP CED. 
4.3 Examples 

We demonstrate the above computational procedure on orbit graphs with two examples: the graph 
K n -K n and the complete binary tree. Both examples will be revisited in $5] using the method of 
block diagonalization. 

4.3.1 The graph K n -K n 

The graph K n -K n consists of two copies of the complete graph K n joined by a bridge (see Fig- 



ure 



9(a) ). We follow the three steps described in §4.21 
First, it is clear by inspection that the full automorphism group is C2 x (5n-l x S n -\). The 
actions of S n -\ x S n -\ are all possible permutations of the two set of n — 1 vertices, distinct from 
the two center vertices x and y, among themselves. The group C2 acts on the graph by switching 
the two halves. The semi-direct product symbol X means that the actions of S n -i x S n -i and C2 
do not commute. 

By symmetry analysis in £j2j there are three edge orbits under the full automorphism group: the 
bridging edge between vertices x and y, the edges connecting x and y to all other vertices, and the 
edges connecting all other vertices. Thus it suffices to consider just three transition probabilities 



Po, Pi, and P2, each labeled in Figure 9(a) | on one representative of the three edge orbits. 



As the second step, we construct the orbit chains. The orbit chain of K n -K n under the full 
automorphism group is depicted in Figure |9(b)| The orbit O x includes vertices x and y, and the 
orbit O z consists of all other 2(n — 1) vertices. The transition probabilities of this orbit chain are 
calculated from equation (fT6|) and are labeled on the directed edges in Figure |9(b)[ Similarly, the 



orbit chain under the subgroup S n -i x S n -i is depicted in Figure 9(c) While these two orbit chains 
are the most obvious to construct, none of them contains all eigenvalues of the original chain, nor 
does their combination. For the one in Figure [9(b)[ the full automorphism group does not have a 



fixed point either of its orbit O x or O z . For the one in |9(c) the automorphism group S n -i x S n -i 



has a fixed point in O x (either x or y), but does not have a fixed point in O z (note here O z is the 
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(a) The graph K n -K n . 




(b) Orbit chain under C2 X {S n -\ x S,, 



n-ll 



Pi Pi 



(c) Orbit chain under S n -i x 5 n _i. 



(n-2)p 2 




(n-2)pi (i-l)Pi 
(d) Orbit chain under 5 n -2 x 5 n -i- 



Figure 9: The graph K n -K n and its orbit chains under different automorphism groups. Here 



O x , O z , O u , O v represent orbits of the vertices x, z, u, v (labeled in Figure 9(a) ), respectively, under 
the corresponding automorphism groups in each subgraph. 
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orbit of z under the full automorphism group). To fix the problem, we consider the orbit chain 
under the group S n - 2 x S n -i, which leave the vertex x, y, and z fixed, while permuting the rest 
n — 2 vertices on the left and the n — 1 points on the right, respectively. The corresponding orbit 
chain is shown in Figure |9(d)[ By Corollary 14.21 a U distinct eigenvalues of the original Markov 
chain on K n -K n appear as eigenvalues of this orbit chain. Thus there are at most five distinct 
eigenvalues in the original chain no matter how large n is. 

To finish the second step, we calculate the transition probabilities of the orbit chain under 
H = S n -2 x S'n-i using equation (fTBj) and label them in Figure [9(d) | If we order the vertices of 
this orbit chain as (x,y,z, O u , O v ), then the transition probability matrix on the orbit chain is 



Ph 



l-po-(n- l)pi 

Po 1 
Pi 
Pi 




Po 

po - (n- l)p 1 



Pi 



Pi 



pi - (n- 2)p 2 

P2 





(n - 2)p 1 


(n - 2)p 2 
\-p\~P2 






(n - \)p\ 



1 ~Pi 



By equation (|17p . the stationary distribution of the orbit chain is 



1 1 1 n-2 n- 1 

1 2r? 2n' 2n' 2n ' 2n 



As the third step, we solve the SDP (|19|) with the above parametrization. It is remarkable to 
see that we only need to solve an SDP with 4 variables (three transition probabilities po, p±, p2, 
and the extra scalar s) and 5x5 matrices no matter how large the graph (the number n) is. 

We will revisit this example in §5.3.41 using the block diagonalization method, where we present 
an analytic expression for the exact optimal SLEM and corresponding transition probabilities. 



4.3.2 Complete binary tree 

We consider a complete binary tree with n levels of branches, denoted as T n . The total number 
of nodes is |V| = 2 n+1 — 1. The matrix inequalities in the corresponding SDP have size |V| x |V|, 
which is clearly exponential in n. However, the binary tree has a very large automorphism group, 
of size 2 ( - 2 ' l ~ 1 ). This automorphism group is best described recursively. Plainly, for n = 1, we have 
Aut(Tx) = S'2. For n > 1, it can be obtained by the recursion 

Aut(T fc+1 ) = Aut(T fc ) i S 2 , k = 1, . . . , n - 1, 

where ? represents the wreath product of two groups (e.g., [ JK8I] ) . More specifically, let g = (51,52) 
and h = (hi, /12) be elements of the product group Aut(7fc) x Aut(7fc), and a and tt be in S2. The 
multiplication rule of the wreath product is 

(g,a)(h,Tt) = ((giK- Hl) ,g2h a -i i 2)),^) ■ 

This is a semi-direct product Aut(7fc) 2 x S2 (cf. the automorphism group of K n -K n ). From the 
above recursion, the automorphism group of T n is 

Aut(7^) = S 2 l S 2 l ■ ■ ■ I S 2 (n times). 

(The wreath product is associative, but not commutative.) The representation theory of the au- 
tomorphism group of the binary tree has been thoroughly studied as this group is the Sylow 
2-subgroup of a symmetric group; see [OOR04t IAV05| . 
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(a) Orbit graph and chain under S2I S2I £2- (b) Orbit graph under (S2 I S2) X (52 I S^)- 




Figure 10: Orbit graphs of the complete binary tree T n (n = 3) under different automorphism 
groups. The vertices surrounded by a circle are fixed points of the corresponding automorphism 
group. 



The orbit graph of T n under its full automorphism group is a path with re+1 nodes (Figure 10(a 
left). Since there are n orbits of edges, there are n different transition probabilities we need to 
consider. We label them as p^, k = 1, ... ,n, from top to bottom of the tree. The corresponding 
orbit chain, represented by a directed graph labeled with transition probabilities between orbits, 



is shown on the right of Figure 10(a) To simplify presentation, only the orbit graphs are shown 
in other subfigures of Figure [TUJ The corresponding orbit chains should be straightforward to 
construct. 

The largest subgroup of Aut(7^) that has a fixed point in every orbit under Aut(7^) is 

ra-l 

W n = H(S 2 l---lS 2 ) (k times) 

k=l 



where \\ denotes direct product of groups. The corresponding orbit graph is shown in Figure 10(d) 
for n = 3. The number of vertices in this orbit graph is 

1 + 2 + • • • + n + (n + 1) = ( " * 1 j = -(n + l)(n + 2), 



2 J 2 

which is much smaller than 2 n+1 — 1, the size of T n . 

From the above analysis, we only need to solve the fastest reversible Markov chain problem on 
the orbit graph of size ( n ^ 1 ) with n variables p\, . . . ,p n . In next section, using the technique of 
block diagonalization, we will see that the transition probability matrix of size C^^ 1 ) can be further 
decomposed into smaller matrices with sizes l,2,...,n + l. Due to an eigenvalue interlacing result, 



we only need to consider the orbit chain with 2n + 1 vertices in Figure 10(b) 
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5 Symmetry reduction by block diagonalization 



By definition of the fixed-point subspace T (in §2.2p . any transition probability matrix P £ T 
is invariant under the actions of Aut(£7). More specifically, for any permutation matrix Q given 
by a 6 Aut(£7), we have QPQ T = P, equivalently QP = PQ. In this section we show that this 
property allows the construction of a coordinate transformation matrix that can block diagonalize 
every P € T ' . The resulting blocks usually have much smaller sizes and repeated blocks can be 
discarded in computation. 

The method we use in this section is based on classical group representation theory (e.g., 
[Ser77] ) . It was developed for more general SDPs in |GP04j . and has found applications in sum- 
of-squares decomposition for minimizing polynomial functions [ParOO] IPar03l IPS03] and controller 
design for symmetric dynamical systems [CLP03] . A closely related approach is developed in 
[dKPS07j, which is based on a low-order representation of the commutant (collection of invariant 
matrices) of the matrix algebra generated by the permutation matrices. 

5.1 Some group representation theory 

Let G be a group. A representation p of G assigns an invertible matrix p(g) to each g £ G in such 
a way that the matrix assigned to the product of two elements in G is the product of the matrices 
assigned to each element: p(gh) = p(g)p(h). The matrices we work with are all invertible and are 
considered over the real or complex numbers. We thus regard p as a homomorphism from g to the 
linear maps on a vector space V . The dimension of p is the dimension of V . Two representations 
are equivalent if they are related by a fixed similarity transformation. 

If W is a subspace of V invariant under G, then p restricted to W gives a subrepresentation. 
Of course the zero subspace and the subspace W = V are trivial subrepresentations. If the repre- 
sentation p admits no non-trivial subrepresentation, then p is called irreducible. 

We consider first complex representations, as the theory is considerably simpler in this case. 
For a finite group G there are only finitely many inequivalent irreducible representations . . . , i?^ 
of dimensions ni, . . . , n^, respectively. The degrees Uj divide the group order |G|, and satisfy the 
condition Ya=x n f = 1^1- Every linear representation of G has a canonical decomposition as a 
direct sum of irreducible representations 

p = mi-di © m 2 t?2 © • • • © m h -d h , 

where m\, . . . ,mh are the multiplicities. Accordingly, the representation space C n has an isotypic 
decomposition 

C n = V l © • • • © V h (21) 
where each isotypic components consists of rrii invariant subspaces 

V i = Vl®---®V™ i , (22) 

each of which has dimension and transforms after the manner of A basis of this decomposition 
transforming with respect to the matrices $i(<?) is called symmetry-adapted and can be computed 
using the algorithm presented in |Ser77t §2.6-2.7] or [FS921 §5.2]. This basis defines a change of 
coordinates by a matrix T collecting the basis as columns. By Schur's lemma, if a matrix P satisfies 

p(g)P = Pp(g), V 5 G G, (23) 
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then T 1 PT has block diagonal form with one block P, for each isotypic component of dimension 
mini, which further decomposes into n« equal blocks Bi of dimension rrii. That is 



T~ L PT 



P 











P, 



Bi 











P; 



(24) 



For our application of semidefinite programs, the problems are presented in terms of real ma- 
trices, and therefore we would like to use real coordinate transformations. In fact a generalization 
of the classical theory to the real case is presented in |Ser771 §13.2]. If all "&i(g) are real matrices 
the irreducible representation is called absolutely irreducible. Otherwise, for each $j with complex 
character its complex conjugate will also appear in the canonical decomposition. Since p is real 
both will have the same multiplicity and real bases of Vi + Vi can be constructed. So two com- 
plex conjugate irreducible representations form one real irreducible representation of complex type. 
There is a third case, real irreducible representations of quaternonian type, rarely seen in practical 
examples. 

In this paper, we assume that the representation p is orthogonal, i.e., p(g) 7 p(g) = p{g)p(g) T = I 
for all g £ G. As a result, the transformation matrix T can also be chosen to be orthogonal. Thus 
T _1 = T T (for complex matrices, it is the conjugate transpose). For symmetric matrices the block 
corresponding to a representation of complex type or quaternonian type simplifies to a collection 
of equal subblocks. For the special case of circulant matrices, complete diagonalization reveals all 
the eigenvalues [Dia88, page 50]. 



5.2 Block diagonalization of SDP constraint 

As in Q2.2\ for every a £ Aut(£7) we assign a permutation matrix Q(o~) by letting Qij(o~) = 1 if 
i = an d Qij(o~) = otherwise. This is an n-dimensional representation of Aut(G), which 
is often called the natural representation. As mentioned in the beginning of this section, every 
matrix P in the fixed-point subset T has the symmetry of Aut(£7); i.e., it satisfies the condition (|23j) 
with p = Q. Thus a coordinate transformation matrix T can be constructed such that P can be 
block diagonalized into the form (j24[) . 

Now we consider the SDP ([5]), which is the FMMC problem formulated in the fixed-point 
subset P. In ^2.31 we have derived the expression P(p) = I — J2k=i PkLk, where is the Laplacian 
matrix for the A;th orbit graph and pk is the common transition probability assigned on all edges in 
the feth orbit graph. Note the matrix P{p) has the symmetry of Aut(£7). Applying the coordinate 
transformation T to the linear matrix inequalities, we obtain the following equivalent problem 

minimize s 

subject to -sl mi H Bi(p) - Ji < sl m . , i = 1, . . . , h 

(25) 

Pk >0, k = l,...,N 
Y!,k=i( L k)u Pk < 1, i = l,...,n 

where Bi(p) correspond to the small blocks Bi in (|24[) of the transformed matrix T T P(p)T, and Ji 
are the corresponding diagonal blocks of T T (l/n)ll T T. The number of matrix inequalities h is the 
number of inequivalent irreducible representations, and the size of each matrix inequality rrii is the 
multiplicity of the corresponding irreducible representation. Note that we only need one out of ni 
copies of each Pj in the decomposition (|24p . Since rrii can be much smaller than n (the number of 
vertices in the graph), the improvement in computational complexity over the SDP formulation ([5]) 
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can be significant (see the flop counts discussed in §1 .2j) . This is especially the case when there 
are high-dimensional irreducible representations (i.e., when rij is large; see, e.g., K n -K n defined 
in TOT]) . 

The transformed SDP formulation (125j) needs some further justification. Namely, all the off- 
diagonal blocks of the matrix T T (l/n)ll T T have to be zero. This is in fact the case. Moreover, 
the following theorem reveals an interesting connection between the block diagonalization approach 
and the orbit theory in 

Theorem 5.1. Let H be a subgroup of Aut(G), and T be the coordinate transformation matrix 
whose columns are a symmetry- adapted basis for the natural representation of H . Suppose a Markov 
chain P defined on the graph has the symmetry of H . Then the matrix T T (l/n)ll T T has the same 
block diagonal form asT T PT. Moreover, there is only one nonzero block. Without loss of generality, 
let this nonzero block be J\ and the corresponding block of T T PT be B\ . These two blocks relate to 
the orbit chain Pjj by 



Bi 
-h 



nV2p Hn -i/2 



qq 



(26) 
(27) 



where H = Diag(7r#), q = y/WJi, and tth is the stationary distribution of Ph- 



Proof. First we note that P always has a single eigenvalue 1 with associated eigenvector 1. Thus 1 
spans an invariant subspace of the natural representation, which is obviously irreducible. The cor- 
responding irreducible representation is isomorphic to the trivial representation (which assigns the 
scalar 1 to every element in the group). Without loss of generality, let V\ be the isotypic component 
that contains the vector 1. Thus V\ is a direct product of //-fixed vectors (each corresponds to a 
copy of the trivial representation), and 1 is a linear combination of these vectors. 

Let mi be the dimension of V\, which is the number of //-fixed vectors. We can calculate m\ 
by Frobenius reciprocity, or "Burnside's Lemma"; see, e.g., [Ser77]. To do so, we note that the 
character x of the natural representation Q(g), g E H, is the number of fixed points of g, i.e., 

X (g) = TrQ{g) = FP( 5 ) = #{v e V : g(v) = v}. 



1 

\H\ 



"Burnside's Lemma" says that 

^FPQ?) =#orbits. 

The left-hand side is the inner product of x with the trivial representation, 
number of //-fixed vectors in V . So mi equals the number of orbits under H. 

Suppose that V = 0\ U . . . U O mi as a disjoint union of //-orbits. Let bi(v) = l/yf\Oj\ if 
v 6 Oi and zero otherwise. Then b\, . . . ,b mi are //-fixed vectors, and they form an orthonormal 



It thus counts the 



symmetry-adapted basis for V\ (these are not unique). Let T\ = [b\ 



'mi 



be the first m\ columns 



of T. They are orthogonal to all other columns of T. Since 1 is a linear combination of b\, . . . , b mi , 
it is also orthogonal to other columns of T. Therefore the matrix T T (l/n)ll T T has all its elements 
zero except for the first mi x mi diagonal block, which we denote as J\. More specifically, Ji = qq T 
where 



1 



Xfl 



lOi 



10 



mi I 



Oi 



n 



o 



mi | 



n 



24 



41 ( 

.4 , 


» • 

£ O 




£ o 



Figure 11: A 3 x 3 grid graph. 



Note that by (|17p the stationary distribution of the orbit chain Ph is 

T 



\Oi\ 



n 



I Orrii | 



n 



Thus we have q = y/Wn- This proves §Ft 

Finally we consider the relationship between B\ = PT\ and Ph- We prove (j26[) by showing 



n -V2o n i/2 



n -i/2 T r pri ri 



1/2 



It is straightforward to verify that 



6f 



KM 



\Oi 



if v e Oj 

if u £ Oi 



Tl nV2 



in 



[b'i 



1 





if v e Oj 
if v £ Oi 

The entry at the i-th row and j-th column of the matrix H~ l ^ 2 Tj PT\Il 1 / 2 are given by 



veOi 



In the last equation, we have used the fact that Pff(Oj, Oj) is independent of which v S Oi is 
chosen. This completes the proof. □ 

From Theorem 15.11 we know that B\ contains the eigenvalues of the orbit chain under H. 
Other blocks Bi contain additional eigenvalues (not including those of Ph) of the orbit chains 
under various subgroups of H. (Note that the eigenvalues of the orbit chain under H are always 
contained in the orbit chain under its subgroups). With this observation, it is possible to identify 
the multiplicities of eigenvalues in orbit chains under various subgroups of Aut(C/) by relating to 
the decompositions (121j) . (I22j) and (124D (some preliminary results are discussed in [BDPX05J). 



5.2.1 A running example 

As a running example for this section, we consider a Markov chain on a 3 x 3 grid Q, with a total of 
9 nodes (see Figure [TTj) . The automorphism group Aut(£7) is isomorphic to the 8-element dihedral 
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group D4, and corresponds to flips and 90-degree rotations of the graph. The orbits of Aut(C7) 
acting on the vertices and edges are 



and 



{1,3,7,9}, {5}, {2,4,6,8} 



{{1, 2}, {1, 4}, {2, 3}, {3, 6}, {4, 7}, {7, 8}, {6, 9}, {8, 9}}, {{2, 5}, {4, 5}, {5, 6}, {5, 8}}, 



respectively. So Q is neither vertex- nor edge-transitive. 

By Corollary |2.2l we associate transition probabilities a and 6 to the two edge orbits, respectively. 
The transition probability matrix has the form 





' l-2a 


a 





a 
















" 




a 


l-2o-6 


a 





b 





















a 1 


-2a 








a 















a 





1- 


-2a 


-b b 







a 








p = 





b 
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1-46 


b 







b 













a 





6 


l-2o- 


-6 








a 













a 








1 


-2a 


a 



















b 







a 


\-2a-b 


a 



















a 







a 


l-2a _ 


The matrix P satisfies Q{a)P = 


= PQ{a) 


for 


every a £ Aut(Q). 


Usin£ 


; the algorithm in [FS92 



we found a symmetry-adapted basis for the representation Q, which we take as columns to form 



T 






1 





1 





V2 

















1 





-1 





1 





1 





1 





-1 











V2 











1 





1 





1 





-1 


2 
































1 





1 





-1 





1 





1 





-1 











-V2 











1 





-1 





-1 





-1 





1 





1 





-y/2 












With this coordinate transformation matrix, we obtain 

1-46 26 
l-2a 2a 
26 2a \-2a-b 



T T PT 



l-2a 



\-2a-b 



l-2a V2a 
V2a \-2a-b 



l-2a y/2a 
V2a \-2a-b 



The 3-dimensional block B\ contains the single eigenvalue 1, and it is related to the orbit chain in 
Figure [12] by the equation (|26[) . The corresponding nonzero block of T T (l/n)ll T T is 



Ji 



1 



26 



2a 46 
Figure 12: The orbit chain of the 3x3 grid graph. 



Next, we substitute the above expressions into the SDP (|25p and solve it numerically. Since 
there are repeated 2x2 blocks, the original 9x9 matrix is replaced by four smaller blocks, of 
dimension 3,1,1,2. The optimal solutions are 

a* 0.363, 6* ps 0.2111, fj,* ^ 0.6926. 

Interestingly, it can be shown that these optimal values are not rational, but instead algebraic 
numbers with defining minimal polynomials: 

18157 a 5 - 17020 a 4 + 6060 a 3 - 1200 a 2 + 180 a - 16 = 
1252833 6 5 - 1625651 6 4 + 791936 6 3 - 173536 6 2 + 15360 6 - 256 = 
54471 fi 5 - 121430 // + 88474 // - 18216 // - 2393^ + 262 = 0. 



5.3 Examples 

We revisit some previous examples with the block diagonalization method, and draw connections 
to the method based on orbit theory in We also discuss some additional examples that are 
difficult if one uses the orbit theory, but are nicely handled by block diagonalization. In many of 
the examples, the coordinate transformation matrix T can be constructed directly by inspection. 



5.3.1 Complete bipartite graphs 

For the complete bipartite graph K m ^ n (see Figured]), This graph is edge-transitive, so we can 
assign the same transition probability p on all the edges. The transition probability matrix has the 
form 

(1 - np)I m 

P i-mxn 
plnxm (l-mp)I n 

We can easily find a decomposition of the associated matrix algebra. It will have three blocks, and 
an orthogonal block-diagonalizing change of basis is given by 



P(p) 



(l/^/m)l mx i 






(1/Vn)l 



nxl 



F m 

F m 



where F n is an n x (n — 1) matrix whose columns are an orthogonal basis of the subspace comple- 
mentary to that generated by l nx i- 

In the new coordinates, the matrix T T P(p)T has the following diagonal blocks 



1 — mp p^Jnm 
P\Jnm 1 — np 



n-l 



(1 — mp), 



m—l 



(8) (1 — np). 



The 2x2 block has eigenvalues 1 and 1 — (to + n)p. The other diagonals reveal the eigenvalue 
1 — mp and 1 — np, with multiplicities n — 1 and to — 1, respectively. The optimal solution to the 
FMMC problem can be easily obtained as in (flOl) and (fTTj) . 
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To draw connections to the orbit theory, we note that the above 2x2 block is precisely B\ in 



the equation (I26|) . and the corresponding Pjj is the orbit chain shown in Figure 8(a) In addition 
to the two eigenvalues in B\, the extra eigenvalue in the orbit chain of Figure 8(b) is 1 — np, and 



the extra eigenvalue in Figure 8(c) is 1 — mp. All these eigenvalues appear in the orbit chain in 



Figure |8(d)[ As we have seen, the block diagonalization technique reveals the multiplicities in the 
original chain of the eigenvalues from various orbit chains. 



5.3.2 Complete fc-partite graphs 

The previous example generalizes nicely to the complete /c-partite graph K nii ___ >nk . In this case, the 
fixed-point reduced matrix will have dimensions Ei n i> an d the structure 



Pip) 



'1 - E^i njpxj)l ni PitX 

P2lln 2 xm (1 - E#2 n jP2j)In 2 



njXni 



Pk2^ 



Plk^-niXn k 
P2k^-n 2 xn k 

'^-Ej^k^Pkj)^ 



where the probabilities satisfy pij = pji. There are total ( 2 ) independent variables. 

In a very similar fashion to the bipartite case, we can explicitly write the orthogonal coordinate 
transformation matrix 



(l/ v /ral)l ni 







xl 







F, 



ni 







(1/V^)ln fc xl ... F, 



Ilk 



The matrix T T P(p)T decomposes into k + 1 blocks: one of dimension k, with the remaining k 
blocks each having dimension n.j — 1. The decomposition is: 



(1 - Ej^i n jPij) Pl2y/nin 2 
P2i^n 2 n 1 (1 - Y.j^2 n jP2j) 



Pki^n k ni 



Pk2\Jn k U2 

i = 1, . . . , k. 



P\k\Jn\n k 
P2kyjn 2 n k 

(i-Ej#fc n iPkj) 



/„._! (8) (1 -^rij-py), 
These blocks can be substituted into the SDP (|25p to solve the FMMC problem. 



5.3.3 Wheel graph 

The wheel graph consists of a center vertex (the hub) and a ring of n peripheral vertices, each 
connected to the hub; see Figure [131 It has total n + 1 nodes. Its automorphism group is isomorphic 
to the dihedral group D n with order 2n. The transition probability matrix has the structure 



P 



1 — np 
P 
P 

P 
P 



P 

p — 2q 

q 


Q 



P 

q 

\-p-1q 






V 





P 
q 
o 



1 — p — 2q q 

q \-p-2q 



(28) 
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Figure 13: The wheel graph with n = 9 (total 10 nodes). 



where p and q are the transition probabilities between the hub and each peripheral vertex, and 
between adjacent peripheral vertices, respectively. 

For this structure, the block-diagonalizing transformation is given by 

1 27Tl(j-l)(fc-l) 

T = Diag(l, F n ), [Fnhk = ~i=e- 

Jn 



where T n is the unitary Fourier matrix of size nxn. As a consequence, the matrix T 1 PT is block 
diagonal with a 2 x 2 matrix and n — 1 scalars on its diagonal, given by 

1 — np y/np 
^fnp 1 — p 

and 

l-p+(uji+uj- k -2)-q, k = l,...,n-l 

where iv n = e « is an elementary n-th root of unity. The 2x2 block is B\, which contains 
eigenvalues of the orbit chain under D n (it has only two orbits). 

With the above decomposition, we obtain the optimal solution to the FMMC problem in closed 
form 

p = -, q* 



n r, 2tt 2|n/2j7r ' 

'* 2 — cos — — cos L ' J 



The optimal value of the SLEM is 



1 x 2vr 2|n/2j7T 

1\ cos - - COS 

n) 2-cosf -cos^^ 



Compared with the optimal solution for the cycle graph in ([8]) and Q, we see an extra factor 
of 1 — 1/n in both the SLEM and the transition probability between peripheral vertices. This is 
exactly the factor improved by adding the central hub over the pure n-cycle case. 

The wheel graph is an example for which the block diagonalization technique works out nicely, 
while the orbit theory leads to much less reduction. Although there are only two orbits under 
the full automorphism group, any orbit graph that has a fixed peripheral vertex will have at least 
(n + l)/2 orbits (the corresponding symmetry is the reflection through that vertex). 
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5.3.4 K„-K n 



We did careful symmetry analysis for the graph K n -K n in §4.3.11 see Figure [9l The transition 
probability matrix on this graph has the structure 



P 



C 
Pl l T 





■po-(n - l)pi 

Po 






Po 

p - (re - l)pi 
pil 






Pil J 

C 



where C is a circulant matrix 



C= (1 -Pl - (n- 3)p 2 )-?n-l +P2l(n-l)x(n-l)- 

Since circulant matrices are diagonalized by Fourier matrices, we first use the transformation 
matrix 

" 

10 
10 

^ n _i 

where T n -i is the unitary Fourier matrix of dimension n — 1. This corresponds to block diagonal- 
ization using the symmetry group S n -i X 5 n _i, which is a subgroup of Aut(-fC n -iT ri ). The matrix 



T x x PTi has diagonal blocks 

Pi 

B> ' ^ 



Ipi 1 








p - (n- l)pi 

Po 













po - (re - l)pi 

— Ipi 1 



- Ipi 
Pi 



and 



2 „_ 4 ®(l-pi-(re-l)p 2 ). (29) 

From this we know that P has an eigenvalue 1 — p\ — (n — l)p2 with multiplicity 2re — 4, and 
the remaining four eigenvalues are the eigenvalues of the above 4x4 block B[. The block B[ 
corresponds to the orbit chain under the symmetry group H = S n -\ x S n -\. More precisely, 
B[ = n 1 / 2 P ff II- 1 / 2 J where n = Diag(vr H ), Ph and tth are the transition probability matrix and 



stationary distribution of the orbit chain shown in Figure 9(c) , respectively. 



Exploring the full automorphism group of K n -K n , we can further block diagonalize B[. Let 



Ti 



1 



1 

1 

1 

1 



The 4x4 block B^ is decomposed into 
Pi 



1 - 



y/n — Ipi 
1 - (re - l)px 



1 - 2p - (n - l)pi y/n-lpi 
y/n — Ipi 1 — pi 



The first block is B\ , which has eigenvalues 1 and 1 — np\ . By Theorem 15. 1\ B\ is related to the 
orbit chain under KvX{K n -K n ) (see Figure [9(b) \ by the equation (f26|) . The second 2x2 block has 

eigenvalues 

1 - po - (l/2)npi ± V(Po + (l/2)n Pl ) 2 -2p Pi. 
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These are the eigenvalues contained in the orbit chain of Figure 9(c) but not in Figure [9(b)| 



In summary, the distinct eigenvalues of the Markov chain on K n -K n are 



1, 1 - npi, l-po- (l/2)npi ± y 7 (po + (l/2)npi) 2 - 2p Q p 1 , 1 - p\ - (n - l)p 2 

where the last one has multiplicity 2n — 4, and all the rest have multiplicity 1. To solve the FMMC 
problem, we still need to solve the SDP (|25p . There are three blocks of matrix inequality constraints, 
with sizes 2, 2, 1, respectively. Note that the total size is 5, which is exactly the size of the single 
matrix inequality in the SDP (|19p when we used the orbit theory to do symmetry reduction. As 
we mentioned before, the huge reduction for K n -K n is due to the fact that it has an irreducible 
representation with high dimension 2n — 4 and multiplicity 1 (see [BDPX05, Proposition 2.4]). In 
the decomposition (|24l) . this means a block of size 1 repeated 2n — 4 times; see equation ([29]). 

Since now the problem has been reduced to something much more tractable, we can even obtain 
an analytic expression for the optimal transition probabilities. The optimal solution for the K n -K n 
graph (for n > 2) is given by: 



'"" 1 V 1 ~ l) n + 2-2^2 7 Pl ~ n + 2-2V2' ?2 ~ (n - l)(n + 2 - 2^2) ' 



(V2-1 

The corresponding optimal convergence rate is 



* _ n-4 + 2^/2 
M ~ n + 2-2^/2' 

For large n, we have (M* = 1 - <tz ^ L + o This is quite close to the SLEM of a suboptimal 

construction with transition probabilities 

As shown in [BDPX05] . the corresponding SLEM is of the order /i = 1 — ^ + O (-tt); here we 
have 6 — 4y/2 ~ 0.3431. The limiting value of the optimal transition probability between the two 
clusters is s/2 - 1 « 0.4142. 



5.3.5 Complete binary trees 



Since the automorphism groups of the complete binary trees T n are given recursively (see §4.3.2p . 
it is also convenient to write the transition probability matrices in a recursive form. We start from 
the bottom by considering the last level of branches. If we cut-off the rest of the tree, the last level 
has three nodes and two edges with the transition probability matrix 



Pn 



1 - 2p n p n p n 
p n 1 - Pn 
Pn 1 - p n 



(31) 



For the tree with n levels T n , the transition matrix Pi can be computed from the recursion 
1 - 2p fe _i 



Vk-iel 



Pk-iek 
Pk-iek 



P k -p k -ie k el 




Pk-l4 







T 



Pk~\dke k 



k = n, n — 1 . . . , 2 



(32) 
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where ej = [1 . . . 0], a unit vector in R <fc with tk = 2 k+1 — 1. 

The coordinate transformations are also best written in recursive form. Let 



r n Diag(l,^ 2 ), T 2 



1 

V2 



1 1 

1 -1 



and define the matrices 

T fc _i = Diag(l, T 2 <g> Tk), k = n, n - 1, . . . , 2. 
It is clear that all the T k are orthogonal. It is easy to verify that T n block-diagonalizes P n 



T T P T 

± n 1 n ± n 



1 - 2p n V2p n 

\/2pn 1 - Pn 
1 - Pn 



In fact T)u block-diagonalizes P k , and the transformed matrices can be obtained recursively 



1 - 2p fc _i 



V2~Pk-iek TT P k T k - pk-iekei" 








P k T k - pk-ie k eT 



for = n, n — 1, . . . , 2. 

The matrix T^ P\T\ has a very special structure. It has n + 1 distinct blocks, each with size 
1, . . . , n + 1, respectively. Order these blocks with increasing sizes as B\, B<z, ■ ■ ■ , B n+ \. The largest 
block of size n + 1 is 

l-2pi V2pi 
\/2pi l-pi-2p 2 \/2p2 

1 - p 2 - 2p 3 \/2p3 



+i 



'nr-l 



-p n _i-2p r , 



V2Pn 
l-Pn 



The matrix i? n is the submatrix of B n+ \ by removing its first row and column. The matrix -B n _i 
is the submatrix of B n+ i by removing its first two rows and first two columns, and so on. The 
matrix B\ is just the scalar 1 — p n . The matrix B n+ \ only appears once and it is related by (|26p 

(for this example we use B n+ \ instead of B\ for notational 
The eigenvalues of B n+ \ appear in T n with multiplicity one. For k = 1, . . . , n, the 



to the orbit chain in Figure 10(a 
convenience 



block B k is repeated 2 n ~ k times. These blocks, in a recursive form, contain additional eigenvalues 
of T n , and the numbers of their occurrences reveal the multiplicities of the eigenvalues. 

More specifically, we note that the orbit chain under the full automorphism group has only one 
fixed point — the root vertex (see Figure 10(a) ). We consider next the orbit chain that has a fixed 



point in the first level of child vertices (the other child vertex in the same level is also fixed). This 



is the orbit graph in Figure 10(b) , which has 2n + 1 vertices. The matrix B n contains exactly the n 
eigenvalues that appear in this orbit chain but not in the one of Figure 10(a) These n eigenvalues 
each has multiplicity 2 n ~ n = 1 in T n . Then we consider the orbit chain that has a fixed point in 
the second level of child vertices (it also must have a fixed point in the previous level). This is the 

which has 3n vertices. 



orbit graph in Figure 10(c 



The matrix B n _\ contains exactly the n — 1 
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Figure 14: Left: the simplest graph with no symmetry. Right: two copies joined head-to-tail. 



eigenvalues that appear in this orbit chain but not in the previous one. These n — 1 eigenvalues 
each has multiplicity 2 n_ ( n_1 ) = 2. In general, for k = 1, . . . , n, the size of the orbit chain that has 
a fixed point in the A;-th level of child vertices is 

(n + 1) + n H h (n + 1 - k) 

(it must have a fixed point in all previous levels). Compared with the orbit chain of (k — l)-th 
level, the orbit chain of k-th level contains additional n + 1 — k eigenvalues. These are precisely the 
eigenvalues of the matrix B n+ i_k, and they all appear in T n with multiplicity 2 n_ ( n+1 ~ fc ) = 2 fc_1 . 

Because of the special structure of B\,... ,B n+ i, we have the following eigenvalue interlacing 
result (e.g., [HJ851 Theorem 4.3.8]) 



<Wl(-Bfc+i) < ^k(Bk) < \k(Bk+i) < Afc-i(-Bfc) < • • • < A 2 (-Bfc) < A 2 (-Bfc+i) < \i(B k ) < \\{B k 



+U 



for k = 1, . . . , n. Thus for the FMMC problem, we only need to consider the two blocks B n+ \ and 
B n (note that Ai (B n +\) = 1). In other words, we only need to consider the orbit chain with 2n + l 



vertices in Figure 10(b) . This is a further simplification over the method based on orbit theory. 



We conjecture that the optimal transition probabilities are 



Pk 




k = 1, . . . , n. 



Notice that these probabilities do not depend explicitly on n, and so they coincide for any two 
binary trees, regardless of the height. With increasing k, the limiting optimal values oscillate 
around and converge to 1/3. 



5.3.6 An example of Ron Graham 

We finish this section with an example raised by Ron Graham. Consider the simplest graph with 
no symmetry (Figure [T41 left). Take n copies of this six vertex graph and join them, head to tail, 
in a cycle. By construction, this 6n vertex graph certainly has C n symmetry. Careful examination 
reveals that the automorphism group is isomorphic to the dihedral group D n (with order 2n). The 
construction actually brings symmetry under reflections in addition to rotations (Figure [T4"l right). 
The orbit graphs under C n and D n are shown in Figure [El 
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Figure 15: Left: orbit graph with C n symmetry. Right: orbit graph with D n symmetry. 



Although the automorphism group of this graph (with 6n vertices) is isomorphic to the ones 
of ra-cycles (Figure [3]) and wheels (Figure [T3|) . finding the symmetry-adapted basis for block- 
diagonalization is a bit more involved. This is due to the different types of orbits we have for 
this graph. The details of block-diagonalizing this type of graphs is described in [FS92, §3.1]. The 
diagonal blocks of the resulting matrix all have sizes no larger than 6x6. Numerical experiments 
show that for n > 3, the fastest mixing chain seems to satisfy 



Intuitively, this 6n vertex graph is the same as modifying a 5n vertex cycle by adding a triangular 
bump (with an additional vertex) for every 5 vertices. Recall that for a pure cycle, we have to use a 
transition probability that is slightly less than 1/2 to achieve fastest mixing; see equation ([8]). Here 
because of the added bumps, it seems optimal to assign transition probability 1/2 to every edge on 
the cycle (p* and p£), except for edges being part of a bump. For the bumps, the probability 1/2 
is shared between the original edge on the cycle (pg) an d the edge connecting to the bump points 
(Ps). Moreover, we observe that as n increases, p\ gets smaller and -p\ S e ^ s closer to 1/2. So for 
large n, the added bump vertices seem to be ignored, with very small probability to be reached; 
but once it is reached, it will staying there with high probability. 

6 Conclusions 

We have shown that exploiting graph symmetry can lead to significant reduction in both the 
number of variables and the size of matrices, in solving the FMMC problem. For special classes of 
graphs such as edge-transitive and distance-transitive graphs, symmetry reduction leads to closed 
form solutions in terms of the eigenvalues of the Laplacian matrix or the intersection matrix. For 
more general graphs, we gave two symmetry reduction methods, based on orbit theory and block 
diagonalization, respectively. 

The method based on orbit theory is very intuitive, but the construction of "good" orbit chains 
can be of more art than technique. The method of block diagonalization can be mostly automated 
once the irreducible representations of the automorphism groups are generated (for small graphs, 
they can be generated using software for computational discrete algebra such as GAP |groU5| ). 
These two approaches have an interesting connection: orbit theory gives nice interpretation of 
the diagonal blocks, while the block diagonalization approach offers theoretical insights about the 
construction of the orbit chains. 

The symmetry reduction method developed in this paper can be very useful in many combina- 
torial optimization problems where the graph has rich symmetry properties, in particular, problems 
that can be formulated as or approximated by SDP or eigenvalue optimization problems involving 
weighted Laplacian matrices (e.g., [MP931 [Goe97] ). In addition to the reduction of problem size, 



Pi =Pt = 2' 



P*2+Pt = J" 
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other advantages of symmetry exploitation includes degeneracy removal, better conditioning and 
reliability |GP04] . 

There is still much to do in understanding how to exploit symmetry in semidefinite program- 
ming. The techniques presented in this paper requires a good understanding of orbit theory, group 
representation theory and interior-point methods for SDP. It is of practical importance to develop 
general purpose methods that can automatically detect symmetries (e.g., the code nauty [McK03j 
for graph automorphisms), and then exploit them in computations. A good model here is general 
purpose (but heuristic) methods for exploiting sparsity in numerical linear algebra, where symbolic 
operations on graphs (e.g., minimum degree permutation) reduce fill-ins in numerical factorization 
(e.g., |GL81j ). As a result of this work, even very large sparse optimization problems are now rou- 
tinely solved by users who are not experts in sparse matrix methods. For exploiting symmetry in 
SDP, the challenges include the development of fast methods to detect large symmetry groups (for 
computational purposes, it often suffices to recognize parts of the symmetries), and the integration 
of algebraic methods (e.g., orbit theory and group representations) and numerical algorithms (e.g., 
interior-point methods). 
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